These files contain complete loan data for all loans issued through the 2007-2015, including the current loan status ('Current', 'Late', 'Fully Paid', etc.) and latest payment information. Additional features include credit scores, number of finance inquiries, address including zip codes, and state, and collections among others. The file is a matrix of about 890 thousand observations and 75 variables. Here, we use a previously transformed data set, which is however a full copy of the original one. For more information, or if you want to download these data, consult:
In [1]:
# Required Libraries
import os
import pandas as pd
import numpy as np
In [2]:
# Path Definitions of Required Data Sets
loan_df_path = os.path.join('/media/ML_HOME/ML-Data_Repository/data', 'loan_df')
us_states_GeoJSON = os.path.join('/media/ML_HOME/ML-Data_Repository/maps', 'us_states-albersUSA-Geo.json')
Here, we provide two choropleth maps concerning the Loan Book Value and the Loan Book Volume distribution across the U.S. States. To do so, we have used the "Bokeh" Python library, a GeoJSON file which defines the U.S. States boundaries and it has been produced from a cartographic boundary shapefile that is provided from the official site of the U.S. Census Bureau, and the Pandas DataFrame grouped_agg_df, where we aggregate the number, and the value of loans per U.S. State. "Bokeh" is a Python library for interactive D3 visualizations!
In [3]:
# Load the Data Set of interest
loan_df = pd.read_pickle(loan_df_path)
In [4]:
# A fast look in the available data set..
loan_df.info(null_counts=True)
In [5]:
# Compute the "Loan Book Amount & Volume" per "US State"
grouped = loan_df.groupby(by=['addr_state'])
grouped_agg = (grouped[['loan_amnt']].agg(np.sum)
.rename(columns={'loan_amnt': 'loanbook_amnt_per_state'}))
grouped_agg['loanbook_vol_per_state'] = grouped['loan_amnt'].agg(np.count_nonzero)
grouped_agg_df = grouped_agg.reset_index()
grouped_agg_df.head()
Out[5]:
In [6]:
# Prepare the "grouped_agg_df" Data Frame as a JSON file...
# This JSON file has been appropriately joined into the GeoJSON Data Source, "us_states_GeoJSON", that we use here.
grouped_agg_df[:5].to_json(orient='records')
Out[6]:
In [7]:
# Load the necessary libraries for the D3 Visualization
from bokeh.io import show, output_notebook
from bokeh.palettes import (
YlOrRd9 as palette1,
YlGnBu9 as palette2)
from bokeh.plotting import figure
from bokeh.models import (
GeoJSONDataSource,
LogColorMapper,
HoverTool,
LogTicker,
ColorBar)
# Load the enriched GeoJSON Data Source, with the loanbook measures of interest
with open(us_states_GeoJSON, 'r') as f:
geo_source = GeoJSONDataSource(geojson=f.read())
# Output the Choropleth Plots in Notebook
output_notebook()
# PROVIDE THE CHOROPLETH OF "LOAN BOOK AMOUNT PER STATE"
palette1.reverse()
color_mapper = LogColorMapper(palette=palette1,
low=grouped_agg_df['loanbook_amnt_per_state'].min(),
high=grouped_agg_df['loanbook_amnt_per_state'].max())
# Define the figure "Tools" we want to make available
TOOLS = "pan, wheel_zoom, reset, hover, save"
# Plot the figure
# Define the figure dimensions and its general details
p = figure(title="Loan Book Value by U.S. States", tools=TOOLS,
plot_width=960, plot_height=500,
x_range=(0, 960), y_range=(500, 0),
x_axis_location=None, y_axis_location=None)
# Render the "Bokeh" patches in Glyph
p.patches('xs', 'ys', source=geo_source,
fill_color={'field': "loanbook_amnt_per_state" ,'transform': color_mapper},
fill_alpha=0.7, line_color="white", line_width=0.5)
# Add a Hover Tools over the U.S. States
hover = p.select_one(HoverTool)
hover.point_policy = "follow_mouse"
hover.tooltips = [
("State", "@state"),
("Loan Book Amount", "@loanbook_amnt_per_state{,.2f} USD"),
("(Long, Lat)", "($x, $y)"),
]
# Add a ColorBar Legend
color_bar = ColorBar(color_mapper=color_mapper, ticker=LogTicker(),
background_fill_alpha=0.7,
label_standoff=5,
major_label_text_color='black',
major_tick_line_color='black', major_tick_line_width=1.3, major_tick_out=5,
border_line_color=None, location=(0,0),
orientation='horizontal', width=500)
p.add_layout(color_bar, 'above')
show(p)
In [8]:
# PROVIDE THE CHOROPLETH OF "LOAN BOOK VOLUME PER STATE"
palette2.reverse()
color_mapper = LogColorMapper(palette=palette2,
low=grouped_agg_df['loanbook_vol_per_state'].min(),
high=grouped_agg_df['loanbook_vol_per_state'].max())
# Define the figure "Tools" we want to make available
TOOLS = "pan, wheel_zoom, reset, hover, save"
# Plot the figure
# Define the figure dimensions and its general details
p = figure(title="Loan Book Volume by U.S. States", tools=TOOLS,
plot_width=960, plot_height=500,
x_range=(0, 960), y_range=(500, 0),
x_axis_location=None, y_axis_location=None)
# Render the "Bokeh" patches in Glyph
p.patches('xs', 'ys', source=geo_source,
fill_color={'field': "loanbook_vol_per_state" ,'transform': color_mapper},
fill_alpha=0.7, line_color="white", line_width=0.5)
# Add a Hover Tools over the U.S. States
hover = p.select_one(HoverTool)
hover.point_policy = "follow_mouse"
hover.tooltips = [
("State", "@state"),
("Loan Book Volume", "@loanbook_vol_per_state{,}"),
("(Long, Lat)", "($x, $y)"),
]
# Add a ColorBar Legend
color_bar = ColorBar(color_mapper=color_mapper, ticker=LogTicker(),
background_fill_alpha=0.7,
label_standoff=5,
major_label_text_color='black',
major_tick_line_color='black', major_tick_line_width=1.3, major_tick_out=5,
border_line_color=None, location=(0,0),
orientation='horizontal', width=500)
p.add_layout(color_bar, 'above')
show(p)
In [ ]: